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ABSTRACT 


Let XqrXoree eX, be n independent random variables 
each distributed uniformly over the interval (0,1) and 
let Dy Dor-+- Diy be the respective spacings of the 
(n+1) intervals into which the unit interval is divided. 
There exists a class of statistical problems related to 
finding the distribution of certain functions of the D;- 
A review of these statistical tests is given in Chapter 
«ea Hes 

In deriving the properties of the statistics, it 
is usually assumed that, because the underlying distri- 
bution of D; is continuous, the problem of tied observa- 
tions does not arise. However this is a problem the 
statistician has to face in actual data situations. We 
have, in this study evaluated the sensitivity of the 
test statistics to ties and grouping errors. We have 
also considered how robust the statistics are by evalua- 
ting their sensitivity under the x5 and normal (0,1) 
distributions. The result of this test for robustness 
will give the applied statistician a valid basis for 
using the statistics in tests of hypothesis other than 
those involving the uniform (0,1) distribution. 

For completeness we have also looked at ties and 
grouping errors under discrete distributions. In this 


case the sensitivity of the statistics to the different 
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tie breaking rules is found to depend quite heavily on 
the number of ties occurring in the data. The Monte 
Carlo method was used in attacking the problem and 


results are given in Chapters IV and V. 
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CHAPTER I 


INTRODUCTION 


In the literature, several types of distribution- 
free goodness-of-fit tests have been studied including 
those based on sample spacings. Sample spacings refer 
to the differences between successive ordered observa- 
tions of a sample. If we let {T, ; 1 2 0} be a sequence 
of random variables for which T, < Ti) S +-- then the 


0 


spacings are the differences as Jy alt hs 


imidiet 
Let us suppose that for a fixed sample size, one 
is given independent random variables Xp rXoree eX, with 
common density function fi: Then we define T) ee ie ele Se 
T, to be the order statistics of the sample and consider 
the spacings D.= T. - Tey: THE =range LOr a is 242 1S “1 
unless the support of the underlying distribution func- 
tion Fy indicates that spacings Dy and/or Do+z may be 
defined. For example, the spacings Dy and Diatl are both 
defined for a uniform (0,1) distribution, but only Dy is 
defined for a chi-square distribution with any degree of 
freedom and neither is defined for the normal distribu- 
ELOlls 
The usual situation in the case outlined above is 
a hypothesis-testing one in which the two hypotheses are 
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Single distribution function which can therefore be 
assumed to be uniform on (0,1). This is because if 
we have a set of n observations Xs from a distribution 
function Fo (x) with density function fy (x) then using 
the probability integral transformation, the x, 'S can 
be transformed to a new set Yin oe where the variable 


x 
y= | £, (u)du = Fy (x) 


is uniformly distributed on (0,1). 

The use of the above spacings model for tests of 
fit has been suggested by several authors in the litera- 
ture. The survey papers by Pyke (1965 and 1970) list 
with some examples the two main types of statistics which 
have been used in the past to provide tests based on 


spacings. The two types are: 


1) Either a sum of the form Ga = ) g, (D;) where g, (D3) 
is a function of the spacings or 
2) A function of the ordered spacings and their ranks. 
For example Cn+1/o1 or Cat 7 1 (suggested by Kendall in 
Greenwood (1946)) where Cy < Cy < ---< C4, are the 
ordered spacings. 

In the literature tests developed are under the 
assumption that the appropriate mathematical model 


involves random variables with continuous distribution 
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functions. With this assumption it is unnecessary to 
deal with the problem of tied observations because it 


is assumed that Pr(T, = Ff = Q for:all: i.» However 


pli 
in actual data situations ties do occur and it is there- 
fore important that the statistician be informed of the 
consequences of some of the alternatives. Grouping also 
becomes inevitable in real data situations when the 
number of possible results are exceedingly large. When 
this happens some of the results are grouped together 
to facilitate computation and better description of the 
population from which the data is obtained. In this 
work we are therefore concerned with investigating the 
sensitivity to ties and grouping of some of the tests 
proposed by Greenwood (1946), Sherman (1950) and Darling 
(1953). We define the rule for tie breaking and grouping 
as follows: all spacings D; which are less than or equal 
to some constant § are set equal to a new constant « i.e. 
i ail for all D; < 6. For tie breaking only « may be 
larger than §6. However when grouping is also considered 
then « is made equal to 6. In Fortran IV language the 
rule isiwevln (D; < oy) Diet <t 

The above rule for tie breaking and grouping is 
motivated by the fact that digital computers are able 
only to perform the arithmetic operations involving the 


statistics on a limited set of rational numbers. This 


apparent defect of the computer gives rise to 
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computational errors like round-off and approximation. 
The round-off errors arise because an infinite number 

of binary digits cannot be used to represent a real 
number due to the finite word length of the computer. 

And also for the fact that most rational numbers require 
an infinite number of bits for exact representation. 

This approximation of the operands in an operation gives 
rise to approximate results which if used as operands in 
other computations will cause the error to be propagated 
to another expression and so on. Approximation errors 
may also be generated because the operations involved in 
the computation of the various statistics are only appro- 
ximate. This is so even for addition and subtraction - 
not to mention division and multiplication - in floating- 
point arithmetic. As a result of these errors, the 
spacing D; may not be exactly equal to zero when it is 
computed, though it may be close to zero. In order to 
account for this the constant 6 is set close to zero 

(eg Don?) so that the rule for tie breaking and grouping 
stated above can be applied. 

However, for discrete distributions the problem of 
tied observations has been discussed in the literature. 
Pyke (1965) in discussing the airplane accident data in 
his paper, gave the rule for tie breaking as follows: 
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T, = T; st¢iand Ti44 = Teay + «€ with « = .2. 
He did this in order to avoid any possible degeneracy 
of the statistic } gn Ba; at Dai is equal to zero. 
Two methods of treating ties have been discussed in the 
literature in the context of rank tests. The first is 
to randomly order the tied observations and the second 
method is to attribute to each of the tied observations 
the average rank of those tied. But there has been no 
investigation in the literature of ties and grouping 
relevant to this work. We examine in this work some 
approaches to the problem. 

In order to be able to use any test, it is necessary 
for one to know something about the distribution of the 
test statistic under Ho the null hypothesis. The tests 
proposed in the literature may also be used for non- 
uniform parent distributions provided only that under 
the null hypothesis one has a set of random variables 
which are distributed as spacings from a uniform distri- 
bution. We are thus also interested in this study in 
investigating how sensitive the proposed tests are to 
the particular underlying distribution from which 
the spacings are derived. The information that may 
be obtained in this respect will enable the statisti- 
cian to decide as to the validity of using the proposed 


tests for other hypothesis-testing problems involving 
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non-uniform distributions. This property, known as 
robustness, has been studied in the literature quite 
extensively by Box (1953 and 1955) and others in 
contexts not directly relevant to this work. The 
particular aspects of robustness which we consider 
relate to the values of the critical points of the 


test statistics. 
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CHAPTER 5ii 


REVIEW OF LITERATURE 


A brief historical survey of the theory of spac- 
ings as applied to statistical problems is given in 
Pyke (1965). However, despite all the earlier references 
given in Pyke, it must be conceded that it was Greenwood 
(1946) who in connection with a problem in epidemiology 
posed the general problem of testing statistical hypo- 
thesis based on sample spacings. He formulated the 
problem as the need to test: 

1) Whether a given set of points on the unit interval 
could have arisen from the independent selection of points 
xX; where the X;"s (i=1,2,...) are independent random 
variables each distributed uniformly over the interval 
(O42) ¥°6r 

2) Whether the set of intervals D; generated by the 

xX; 'S are too nearly equal for the above hypothesis to be 
tenable. 

Greenwood suggested one test based on the squares 
of the intervals D;- However, in the discussion of his 
paper which he read to the Royal Statistical Society, 
other tests were suggested. Much of the research on 
spacings that has been accomplished since that time can 


be directly attributed to the suggestions made at that 
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presentation. In 1953 Darling reviewed the results 
obtained between 1946 and 1953 and also gave the first 
unified approach to the distribution theory of uniform 
spacings using Dirichlet's integrals. Since Darling's 
paper a great deal has been published in the literature 
dealing primarily with the asymptotic behaviour of tests 
based on spacings. The two survey papers by Pyke (1965 
and 1970) give a comprehensive bibliography of the pu- 
blications. For the rest of this chapter we will concern 
ourselves with the work of Greenwood and Moran (1947), 
Sherman (1950) and Darling (1953) for the following 
reasons: 
Ey The statistics proposed by them are not only common 
but have also been studied in detail. 
2) Darling's statistics are of special interest in 
this work because of their degeneracy if De De0. 

We next define the statistics and discuss their 


properties. 


Greenwood and Moran 


The statistic which Greenwood proposed is of the 
form Cu ) Dee Greenwood also gave a few properties 
of its distribution. Commenting on his own suggestion 
in the course of the presentation, Greenwood admitted 
that although he could not think of any logical advan- 


tage in using the interval test, it nevertheless had a 
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psychological attraction for him as a medical statis- 
tician because of the hope he cherished that it might 
some day unveil the incubation period of a disease. 
Moran (1947) proved the asymptotic normality of 
the above statistic. He first proved a general limit 


theorem for the random variable 


2 


2 2 2 
(xX) + xX5 + ... + Xi) 7 (xy SRE HORE Te x xy 


where Xr Xyr eee X are independent and identically 
distributed on (0,”) with finite fourth moments. Moran 
then utilized the construction of uniform spacings as 
exponential random variables proportional to their sum 
to prove the asymptotic normality of Gn He also gave 


the mean and variance of the statistic to be 


E(G)) 


2/4ln +. 2.) 
and 


We find 2) eat 4) eee 


V(G_) 


These results were also obtained independently by Kimball 
(1950). Moran concluded his paper with a table of values 
of the skewness and kurtosis of the statistic showing 


its rather slow tendency towards normality. 


Sherman 


The statistic which Sherman proposed and for which 


he proved its asymptotic normality is of the form 
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This statistic is only a slight modification of one 
suggested by Kendall in the discussion of the paper 
by Greenwood (1946). By the method of moments, Sherman 


derived the mean of Go as 


E(G_) = {n/ (n+1) }2*1 


which converges to l/e as n + », and the variance as 
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Sherman then went through a rather lengthy and exacting 


computation of the moment integrals to prove that the 


10 


random variable cea is asymptotically normally distributed 


with the above mean and variance. He proved the above 
result by showing that the distribution function of the 


standardized variable 


Ge on E(G_) 
vV(G_) 


is asymptotically normal. 
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Darling 


Darling (1953) provided the first general method 
for deriving limit theorems for statistics of the form 
Gn = } g, (D5) based on a formula for the characteristic 
function of Phy which he derived by way of an extension 


to Dirichlet's integrals. Darling then proposed two 


new statistics 


) gn (D; ) 
1 

and 
) ies 
ak 


He showed the former statistic to be asymptotically nor- 


mally distributed with asymptotic mean 
Slate) (in(n) + >) 


and variance 
2 
(n+1) (1/6 - 1) 


where y is Euler's constant given as 0.577216. However 
for the statistic ) 1/D;, Darling did not obtain the 
asymptotic normal A ee Instead he showed that 
the statistic has a limiting exponential distribution 
which is quasi-stable. A distribution F(x) is defined 
to be quasi-stable (the term is due to Lévy) if given 


any two independent random variables X11X5 having dis- 


tribution F(x) and any positive numbers Cy Cos there 
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exists a finite real number B (where B>0O) such that 
(C)Xy + CX.) /B has the same distribution as Xy and 
Xo 

Table 2.1 gives a summary of the properties of 


the four test statistics discussed above. 
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CHAPTER III 


COMPUTATIONAL DETAILS 


The Monte Carlo method is defined as a method of 
solving various problems in computational mathematics 
by constructing for each problem a random process with 
parameters equal to the required quantities of that 
problem. There are two types of problems that can be 
handled by Monte Carlo techniques. These are probabili- 
stic or deterministic according to whether or not they 
are directly concerned with the behaviour and outcome of 
random processes. 

In this study, Monte Carlo techniques were used 
to attack the problem discussed in chapter one. One 
part consisted of generating random numbers from the 
distributions uniform (0,1), chi-square with two degrees 
of freedom and the standardized normal. 

Lurie and Hartley (1971) in a work on goodness of 
fit tests based on the spacing of selected order statis- 
tics, presented an algorithm for generating ordered 
random variates from specified distributions as follows: 
First generate ordered uniform (0,1) random numbers 
based on the following two basic equations - 

i) Via) Se where V, is a uniform variate and 
n is the desired sample size. The cumulative distribu- 


ab «: : ee 
tion function of U(;) is given by PriU (4) < 2 
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1 
ii) Given Vis) , the next uniform variate in the 
sequence, Ds gery , can be generated using the relation 
” 1/ (n-i) 
ise, = 1- - : 
fed) 1 (1; Uiiy? Mia where Vaal is another 
independent uniform (0,1) variate. The conditional 


cumulative distribution function of U ) given Uy 


CLe Ne 


is then given by 


ey = cael! (i) ) = Le tye 


In this way an ordered sequence of uniform (0,1) random 


variates is obtained such that 


SU 


re) s a Sf (n) ° 


Once the Uiiy'8 are obtained, a sequence of random 
numbers X,'s in ascending order can be generated from 
other distributions using the inverse probability inte- 
gral transformation 


x = pl 
1 


Hay 
where P(X; ) is the cumulative distribution function of 
the variate X;- 

Applying the techniques discussed above an ordered 
sequence of random numbers was generated from uniform 
(0,1) distribution using the well-tested pseudo-random 


number generator for system/360 described by Lewis and 


others (1969). 
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For the chi-square distribution with two degrees 
of freedom, the exact transformation X5= -2 gn(l- Veo 
was used to generate an ordered sequence. 

For the standardized normal distribution use was 
made of the well known Hastings (1955) approximation 


FOr xX given p, where x is as defined below: 


uk t eet 
Let p=-— | e * dy and Ops 0.5 
V27 x 
p 
2 
Co J cyt J cot 
CINE Oy aR i ame pL 
1 + djt 9 dot + dt 
2 -4 
where t = /&n(1l/p*) , leo) [<9 455 x10 
and Co = palo Ca hey yf dy = 1.432788 
oye 0.802853 d. = 0.189269 
ons 0.010328 d, = 0.001308 


and p is an ordered uniform (0,1) variate. The above 


approximation for XD is implemented below: 


1) Generate an ordered random number Ui) between 0 
and Ll. 

2) Pr Via) =. 0 sD 7. OSes Da Vii) , otherwise use p = 
1- Uisy: 

2) Evaluate t and subsequently X 


p* 
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Ls} 


lhe Vii) > 0.5, replace the value of Xp obtained 
Le Ge bv =X. 
(3) by p 


Finally from each of the three distributions sample 


sizes of 30, 50, 80 and 100 random numbers were generated 


and used in computing 1000 histories for each of the 


four test statistics in Table 2.1 as follows: 


I) 


a) 


DEL) 


IV) 


V) 


VI) 


VIL) 


Read in a new value for e and 64. 

Generate an ordered sequence of random numbers of 
sample size n from the uniform (0,1) (or chi- 
square (2) or standard normal) distribution. 
Compute the spacings D; between pairs of the 
ordered sequence of random numbers, applying 

the rule for tie breaking and grouping to the 
values of D5 when necessary. 

Using the D;'s compute the value of each of the 
test statistic and store the result for each 
statistic in a separate vector say V. 

Repeat steps (II)-(IV) to obtain a total of 1000 
histories for each test statistic. 

Rank the 1000 histories so obtained in ascending 
order of magnitude to produce ia Voayrecee 

V (1000) ° 

Determine the frequency distribution of the 1000 


histories for each of the test statistics. 
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VIII) Compute estimates for the mean, variance, skewness 


IX) 


X) 


and kurtosis statistics, for each of the test 
statistics. | 

Ouput the results in (VII) and (VIII) and also 
V(900)" V (950) and V (990) corresponding to the 
90, 95 and 99 percent points respectively. 
Repeat (1I)-(IX) until all values of 6 and « are 


tried for each of the distributions. 
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CHAPTER IV 


RESULTS FOR CONTINUOUS DISTRIBUTION CASE 


As discussed in the introduction, because the 
purpose of this study is to evaluate the sensitivity 
of the four test statistics discussed in Chapter II 
to ties and grouping errors, and also to the particu- 
lar underlying distribution, we include in this chapter 
the results of the critical values obtained for each 
test statistic for each underlying distribution. For 
the uniform (0,1) distribution, the critical values are 
the standardized Monte Carlo estimates of the 90, 95 
and 99 percent points. But for both the oe and the 
standard normal distributions the critical values are 
the Monte Carlo estimates of the 90, 95 and 99 percent 
points. The critical values are of particular interest 
in this study because they help make the results of the 
hypothesis test much more meaningful to the statistician. 
Hence their inclusion will enable the statistician to 
realize how far wrong he may be by using the asymptotic 
results. We also give the estimated values of the 
skewness, kurtosis, mean and variance for each of the 
four test statistics under sampling from the various 
distributions. We propose to examine the results for 


each of the distributions separately and to comment 


briefly on them. 
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Uniform (0,1) Distribution 


The asymptotic normality of the Greenwood, Sherman 
and Darling's ) gn D; statistics have been proved analy- 
tically under the assumption of sampling from the 
uniform (0,1) distribution. In Table 4.1 we give the 
estimated values of the standardized 90, 95 and 99 
percent points for the above three statistics based 
on various sample sizes. The well known values for the 
above percentage points of the standard normal distribu- 
tion are 1.64, 1.96 and 2.58 respectively. 

For a sample of size thirty, the estimates obtained 
for the 95 and 99 percent points for the Greenwood sta- 
tistic for the selected values of « were remarkably 
close to each other and to the asymptotic values quoted 
above. However the 90 percent point was quite different 
from the accepted value of 1.64. As the sample size 
increased a corresponding increase in the widening of 
the gap between the results of the percentage points for 
the different values of « and 6 was observed. These 
variations of the percentage points due to different 
values of 6 and « were quite pronounced for both the 
Sherman and Darling's ) gn D, statistics, giving nega- 
tive estimates in some cases for the former statistic. 
On the whole the variation in the upper tail of the 


distribution of each of the three statistic was 
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heightened for each sample size as « and 6 increased 

to 0.01. It was also observed that as the sample size 
increased, the values obtained for the 90, 95 and 99 
percent points for different values of 6 and « deviated 
rather widely from the well known results. This shows 
that the test statistics tend to be more sensitive to 
ties and grouping errors as the sample size increases. 
Thus the common assumption that the asymptotic results 
tend to be better with increasing sample size is not 
supported in this case. 

For the Darling ) 1/D, statistic, the expected 
asymptotic exponential behaviour was not realized. 
Instead it was observed that for different values of § 
and «€ asymptotic normality was gradually approached as 
illustrated in Fig. 4.4 with a plot of the frequency 
histogram for a sample of size thirty. This departure 
of the statistic from exponentiality was confirmed by 
the estimates of the skewness and kurtosis statistics 
given in Table 4.6. The skewness and kurtosis estimates 
of; ‘the: remaining three test statistics also approached 
the expected values for the normal distribution. It 
was however observed that the approach towards normality 
was being realized at a much faster rate for both the 
Sherman and Darling ) fn D,; statistics than for the 
Greenwood statistic. This is illustrated in Fig. 4.l- 


4.3. However the results obtained for the 90, 95 and 99 
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percent points of all the statistics do not support 
this trend towards normality. The frequency histograms 
of both the Sherman and Darling's } 1/D; statistics in 
Figures 4.2 and 4.4 respectively tend to be almost 
symmetrical about the different values of the mean for 
the two statistics. This is confirmed by the estimates 
of the skewness statistic which is close to zero for both 
of the test statistic. But for both Greenwood and 
Darling's ) gn D; statistics, the frequency histograms 
are both non-symmetrical as supported by the estimated 
values of the skewness statistic. 

In a comparison of the theoretical and experimental 
values of the mean the following observations were made 
on the Greenwood, Sherman and Darling's ) gn D, statis- 
tics. They are: 
fia For the Greenwood statistic, the values obtained 
for the different values of 6 and e« did not vary very much 
for the same sample size. This indicates that the Green- 
wood statistic was not too sensitive to the different 
values of 6 and e. 

2) For both the Sherman and Darling's ) gn D, statis- 
tic the experimental values of the mean differed rather 
widely from the theoretical values for values of 6 and 

« equal to 0.005 and 0.01 for the different sample sizes. 
This variation in the mean estimates resulted in the 


widely differing values obtained for the different 
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percentage points for each sample size. These obser- 
vations account for the extreme sensitivity in the 
tails of the distributions of both the Sherman and 
Darling's ) gn D. test statistics to tied observations 
and to grouping. 

The normalized upper tail percentage points for 
Darling's ) 1/D; statistic were computed using the 
estimated mean and variance obtained for each sample 
size. This is because the theoretical values for the 
mean and variance for the case when ) 1/D; approaches 
asymptotic normality have not been given in the litera- 
ture. An examination of the 90, 95 and 99 percent points 
given in Table 4.2 nevertheless indicates that the ) 1/D; 
statistic is also sensitive to the different values of § 
and «. We also notice that as the sample size increases 
the results for the percentage points do not get better 
as theoretically expected. 

In Table 4.3 we give the results obtained for the 
90, 95 and 99 percent points for different values of « 
Keeping 16 constant for a Sample of size thirty... The 
computational rationale for keeping § constant while 
varying « is to evaluate the sensitivity of the test 
statistics to the different values of «. The closeness 
of the results for the three statistics of Greenwood, 
Sherman and Darling's } {n D, indicate that their 
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~teedo seonT veste olqmse done 3 
edd ni ysividiense amine 
pas seprxode sit dod to ian 

jeo3 ,0 ax § om 

 PALGUORD: © 


103 asabog apssnepieq [ist t9qg" pos iismon ost 


eaolissyzsedo bess oF anpiseisste 


ed pites, bedugmee eousw bisatsase, pe {> 


‘i 


olamsa Petey te) ‘apie benistdo sonsii6v BAB ABR! - 
eit 10, eoulsy igodtesceny ori. suntiet ei akin ele 
geo so1ggs yast { “isdw 9259 sis 102 S91SLIBY —— : 


-stesil eds nik aevie masd gon oved ysilsmxon sitoe 


etniog sneoteq 2 bas ce .0€ edt to noisenimsxe BAR 
ni geoledeasven Sob oldest eae 


a 
~ Sik 


port. odd, 30d aeigoib 
b 20 eoulpy gnoxg2trh es oF ovitiense oats. ab D: 


ae _-_ ao _ 


esasotoni este oiqmse ont 6 jsd3 sotton oals Ww 


retied step ton ob esniog sprinsoisg gee, 10% ney 
. badoegxS ARCs anny | 

efit 102 Donistdo ediuser oft evip ow est sider me 

» to eoulsv gnexettib tot adniog $ma0teq, ee. aad 

edt .ysxids osie to slquse 5 tot i saad 
elidw tassenoo 6 nec 103, elenois | 


i- rt! eee 


“a wet ie . ae 


i ‘i 


Btn 2 


- iw : ma es 7 ies re ys - 
Aas ; le 1! Fp te ay 4 : : — 
2s nie aah ohn ae | ro ale, La es ae ‘, o 


24 


constant is not very pronounced. However a closer look 


at the Sherman values reveals that the estimates of the 


percentage points start off at a high value for « = .0Ol, 
then decrease to a minimum value around « = .04 and 
finally rise up to a maximum at « = .Ol. 


The results of the upper tail percentage points 
of Darling's ) 1/D; statistic given in Table 4.4 were 
also remarkably consistent, indicating the relative in- 
sensitivity of the statistic to the different values of 
€. In Table 4.5 we give the theoretical estimates of the 
above percentage points for the exponential distribution 
with the parameter \ estimated as the reciprocal of the 
mean obtained for each sample size. Comparing the 
results contained in the two tables we again realize 
the vast difference between the two sets of values indi- 
cating a rather slow tendency of the } 1/D, statistic 
towards exponential form asymptotically. However a plot 
of the frequency histogram for all values of « showed a 
distribution which is exponential in form. One case is 
illustrated in Figure 4.5 for a sample of size thirty 
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Results of Estimates using U(0,1) Distribution 


with Sample Size 50 
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Results of Estimates using U(0,1) Distribution 


with Sample Size 80 
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Results of Estimates using U(0,1) Distribution 
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The x5 Distribution 


For samples wrometnis distribution, the jritical 
values of the 90, 95 and 99 percent points are given in 
Table 4.7. In Table 4.8 we give the estimated values 
of the skewness, kurtosis, mean and variance for the 
four test statistics. For the Greenwood, Sherman and 
Darling's ) &n D; statistics remarkably consistent 
results were obtained for the different values of 6 and 
€ for each percentage point within each sample of the 
three test statistics. However for the Darling }) 1/D, 
statistic, a sharp drop was noticed for each percent point 
for each increase in the values of 6 and e. 

The skewness and kurtosis estimates showed that the 
approach towards asymptotic normality was much faster 
for the two Darling statistics than for the Sherman sta- 
tistic. The Greenwood statistic on the other hand did 
not show any tendency towards asymptotic normality - 
rather its frequency distribution was consistently expo- 
nential in form. In Figures 4.6-4.9 we give plots of 
the frequency Hitetoguaie obtained for all the four test 
statistics. The mean values obtained for all the 
statistics did not vary much with the different values 


of 6 and « for each sample size. 
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Critical Values of the Statistics 


for x5 Distribution with Sample Size 50 


a Levels 
Statistic] 6 = é€ vO BE Ue. agit 
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Critical Values of the Statistics 


using x3 Distribution with Sample Size 80 
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using x5 Distribution with Sample Size 100 
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Results of Estimates using 


x3 Distribution with Sample Size 30 


Scacisuic 16 Se Skewness 
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Results of Estimates using 


x5 Distribution with Sample Size 50 
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Results of Estimates using 


x5 Distribution with Sample Size 80 


Statistic] 6 = «| Skewness Kurtosis Mean Variance 
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Results of Estimates using 


x5 Distribution with Sample Size 100 
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The Normal (0,1) Distribution 


The critical values of the four test statistics 
with an underlying standard normal distribution are 
presented in Table 4.9. The results obtained for each 
sample size for this example did not show much 
variation within each of the different percentage points 
for the Greenwood and Sherman statistics for different 
values of 6 and «. The two Darling statistics however 
exhibited some variations in the estimated percentage 
points for the different values of § and e« within each 
sample size. 

The skewness and kurtosis estimates - given in 
Table 4.10 - of all the test statistics except Greenwood's 
indicate a clear tendency towards the normal distribu- 
tion. This is illustrated by the plot of the frequency 
histograms in Figures 4.10-4.13. The mean estimate for 
each of the four statistics excluding Darling's } 1/D, 
also did not vary much with the different 6 and e« values 
within each sample size. These observations indicate 
that except for the Darling ) 1/D, statistic the other 
three statistics are not very sensitive to the different 
6 and « values used in the computations. For a summary 
of the results contained in this chapter and the conclusions 
which may be inferred from them, the reader is referred to 


Chapter VI. 
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Results of Estimates using N(0,1) 
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Staeistic |. 6 =e Skewness 


Greenwood 
S008 85622 
-005 1.85623 
Poa | Le 85625 
Sherman 


-001 0.49729 
-005 0.49663 


Stal 0.49452 


Darling's 
L Log Ds | 901 | -0.20694 
-005 -0.16388 


sOL mui eee he Des: 


Darling's 
) 1/D; 001 0.96900 
005 0.39291 


au. ae sooo 


Kurtosis Mean Variance 


5.11418 1.47398 0566213 
5.11424 1.47400 0.66213 


5.11432 1.47413 Us6e6213 


0.36601 1.66639 0.11078 
OFS OK 1.66493 OSLLOGE 


0.36428 Leet US 0.11084 


=Q 12846 | i-76s5o 0199") 3164304 
—O. 1 75S 8 te 3224,8) 20.5022 


=O. Loo4.79 |-/4.. 359504 22.0241 


0.83805 |1271.46191) 401047.43750 
-0.00480 S120 L221) 34189 (O9Lat 


-0.07288 WOFeLI629, | LYGOUsESZEL 


Table 4.10 


Seeva.t 
OOSTS.L 
ELETS Is 


£L$39.0 
ELSaa.od | 


Sisii.2 
ASS{i.e 


Esdes.r || 
esaee.i 


Sest(i.2 


esyen.o- teal 
€9aes.0 
Seven. 


B8TOLL.0 
L80L1-0 
$80L1.0° 


@f330.Lf 


LOGE .0 


cenad.t 
Ec0ee.f 


G0Voe.o 


BSS0E.0 


8b8st.o- 
SET SI. 0- 


YsCoal.0- 


O2TEd.TROLOR|LeLam.cTSE] eoses.o 
IbLed.earee SS. .St8 pabo. o~ | 
tasca.ganaly 008 | sesto.a~ 


at 


7 ae nip ef ae 
a1 Tien et i, f ; 


Sscatistic 15) = 
Greenwood 
-001 
-005 
-Ol1 
Sherman 
-001 
-005 
-Ol 
Darling's 
) Log Dy | 901 
-005 
“Ae all 
Darling's 
) 1/D; 001 
-005 
OL 


Sample Size 50 


Skewness Kurtosis 


2.05436 


6209.19 7.5 
2.05437 6.51988 


2.05440 6.52026 


0.38176 2 Oe el 


0.38149 -0.01484 


0.38147 we esl DPA Ee 


=O 2oebe O.1G2 71 


= Oe Loo 0.01118 


-0.06487 -O513090 


0.71960 aVaroneoars 

0.36841 0.27343 

0.19136"| =O. 04066 
Table 4.10 


Mean 


LW ae NS ES 8 
1.26004 


1.26042 


1'..86:283 
Des ond 


io oLoe 


—142,.68819 
AD O72 15 


Leo. sLooe 


3205.75684 
2100 4355709 


1648.53687 


(continued) 


62 


Variance 


9.51939 
0.51999 


| Pe ae RP) 


0.09969 
0.09963 


0.09965 


48.80760 
87022005 


he EP EP Se 


1059533.00000 
117548.81250 


36137.46484 


gina: eit.) 

i ‘ oC, ans aT. se, whines i ; - 
hore 
os PR aA we 


spasixsv | aseM | eieotzux | 


i 
(eu 


aeneo.s 
TED2o, s 
ObS20.5 


oF - 


eyveiz.a 
egele.a 
 gsose.a 


eeecs.f 
hOOOS.f 
Saoes.t 


eeeLe .0 
eee re. 0) 
«eR eL2.0 


eaeen.o 
eaeeo.G 
eaeeo.0 


{L0£0.0- 
—£8bL0.0~ 
€elso.0- | 


eevas..£ 
faLres.c 


avige,o 
ChLIBE,0- 
VELSE.0 | 


, ee ae 


SISES.0- 
 98eSl, 0~ 
$820.0~ 


LVeoLl.0 
SLf10.0 
O@8EL.0- 


OaTOs.Shlerses. Sal- 
200ss.telersre.eas= | 
@SSIS. eS |SeeIy. abs~ 


manent al re | 2.0 | onertzo. |. 400. 
cesie.anis | ores pitt | Hes 
DeDgd. TELE 0 | aeser.o | s0.| 


Sample Size 80 


Statistic | 6 = e | Skewness 
Greenwood 
-001 2.35386 
-005 2235384 
-O1 2%835 3/70 
Sherman 
-001 0.68965 
-005 0.68969 
At 0.68798 
Darling's 
2 Log Ds | g091 | -0.12567 
-005 -0.06506 
-O1 0.01048 
Darling's 
) 1/D; .001 | 0.43184 
-005 0.23064 
101 0.08449 
Table 4.10 


Kurtosis 


8.49188 
8.49170 


8.49041 


0.85267 


0.84963 


0% 85021 


-O02 00196 


0.01065 


0300299 


0.22959 


0.05033 


=O80L2T2 


Mean 


1.09469 
1.09483 


Lt095 7/7 


2.04024 
2.03009 


1399976 


-281.27246 


-274.45264 


=266434399 


7382.34766 
4612.28906 


3484.95581 


(continued) 


63 


Variance 


0443151 
0443151 


0.43152 


0.09226 
Of09229 


0.09239 


77.86554 
52217628 


36.54663 


2500319.00000 
239621.56250 


64341.89062 


g0n6 ius MSeM | alaodiot | exenwede} > 


' S8.heh.8 
) OCLOD.8 


feLenso 
L2Les.0 
serer.o 


fpoeb.8 


dsseo.0 
esseo.o 
e€seo.0 


2308.00 | 
esega.o- |. 
_ gered.0 | 


BSopO.s | 
@00£0.S 
areee.r 


Tasee.o 
£dehS.0 
rs0e8.0 


Ae@as.tT | apere.tes-! veroo.o- 
SSdti.s@| BaS@eiars-| 2a0r0.0 


E8dP2. 98 | CREM. ddS-| eeson.o 


ai 5 


oesae. rs0ees | aoess.stas| ceoeo.o | naces.on } Boo. 
S008. 1eeRa ? ) bebe | SESEO.0- = re ae 
; ae Te : ri | 


on 


64 


Sample Size 100 


Statistic |6 = e« | Skewness Kurtosis Mean Variance 


Greenwood 
FUG? eo tip dT 4.69563 1.01046 0.32688 
-005 1.87758 4.69569 1.01067 0.32688 
SOL 1.87760 4.69543 PeOLZ10 0.32688 
Sherman 


-001 0.29049 0.03066 PARA Pfs) ODA fe hers. 
-005 0.28727 OU 200 2.09802 SeU7 220 


-O1 0.27796 0.02039 2.05158 0.07988 


Darling's 
) Log D, .001)| -0.12563 0 16378 ase el4IG O7eae a7? 
-0051| -009127 || -0.11990 | -363.05859 161.63948 


ag 418 -0.04028 -0 .25473 -350.62646 |41.70163 


Darling's 
) mek -001 0}. 22293 -0 105179 10929.74609 | 3733730.0000 
-005 0.08284 -0.11058 6649.82031| 319247.62500 


-01 0.00337 ee Cee 4919.22266 77504.31250 


Table 4.10 (continued) 


S008 L16V 


“a 8 BOSE . 6) 7 


889SE.0 
S8dSE.0 


Ecevo.o 


e2ero.0- 


Svieve. te 


TOOLO.L 
OLS IO, £ 


Qd0E0.0 


S€aso.0 


ocbita ~NE- 
RAB E0.E9E- 
ahasa,oee-_ 


VYEGEO.0 


oeelr.o- 


EVh2°5.0+- 


2 


FREQUENCY OF 2 Dj 


400 


300 


200 


100 


65 


FREQUENCY INTERVALS 


Fig.4.10 Frequency Histogram of Greenwood's 


) Dé statistic under N(0,l) distribution. 
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CGHAPTER &V. 


RESULTS FOR DISCRETE DISTRIBUTION CASE 


In this chapter we present the results obtained 
for both Pyke's airplane accident data and for the 
poisson distribution case using different values of « 
to break the ties. We propose to examine how sensitive 
Pyke's data and the data from poisson distribution may 
be to ties and grouping errors using test statistics 
given in Pyke (1965). The statistics with their asymp- 
totic mean yw and variance V are given in Table 5.1 


below: 
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t 
The D gas are normalized spacings defined as 


D'. = (n+ PP (OF Samy 


ni a 


under the assumption of exponential distribution, where 


Us Se H(T; ) =1- exp (-)T, ) 


and 


be 
lA 
Cc 
1A 
1A 
G 
| 
1A 
rH 
1A 
1A 
| 


with n denoting the sample size and X} a constant. Pyke 
also gave the normalized sample value of the statistic 
as 
— -(nt+l)u 

¥(n+1)V 
where t is the value of the statistic. 

In discussing the airplane accident data, Pyke 
used the value « = 0.2 to break the ties. In this 
study different « values were tried on the data and the 
results obtained for the normalized sample values are 
displayed in Table 5.2. The results obtained using 
different values of e« did not show any pronounced varia- 
tions. This is probably due to the fact that there are 
only three ties occurring in the whole sample of size 


31. We now consider the poisson distribution example. 
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Poisson Distribution 


The poisson random numbers used in this work 
were generated by the following procedure: 
1) Generate a uniform (0,1) random variate U; - 


2) Form the product ZX. = U-U.U U,. of a sequence 


k Dred ad aint 

of k+l independent uniform random variates. 
3) Thex, Pa then the lowest value of k which first 
causes the given relation to hold is a random variable 
having the poisson distribution with mean i. The proof 
for the above algorithm is given in Carnarhan and others 
(1969). For this study, the value of A was set to 3 and 
100 poisson random variates were generated as above and 
ordered in ascending sequence. 

As mentioned in Chapter I, the problem of tied 
observations for discrete distributions is discussed in 
the literature. However, Pyke's tie breaking rule for 
the airplane accident data was found to be inapplicable 
to this situation because the number of ties occurring 
in succession is more than two for most poisson random 
sequences generated. We are thus compelled to define 
a new rule for tie breaking in the following way. 

Let Xi Xsas ates Xia x. be a sequence of 
poisson random numbers such that X; = X;,,= ... = X, 


and X. < x. with (k+l) denoting the maximum number of 


ties occurring in the sample. If we denote Eg to be 
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the maximum value ¢« can take on, then 


Ko 
‘Sis Cae 4 
? (k+2) 


and 


Once this upper limit for « is determined, different 
values of « are then chosen such that each of them is 


less than or equal to « Having decided on what value 


0° 


e Should take, we then break the ties as follows: 


Xi4d rF Xi def 
Xs 42 a Xs 2 PAS 
Xe ak 3 X. + Rie 
such that xX; < Xe4] ey yes Xeak = ay 2 


In order to use the statistics given in Table 5.1. 
We apply the transformation H(x) = 1 - exp(-\x) to 
the ordered poisson random variates without any ties 
as follows: If we let T, < Ty < -+-<T, denote the 
ordered poisson variates and Damiz H(T;), then the 
normalized spacings are Di = (nt+1) (U; - U;_4) M 
i=2,...,n-l. Because U; is defined between 0 and 1 


(i.e. O < U. < 1) hence the normalized spacings 
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defined. Once the Drtai he are computed, we group all 
those values which are < zoiee and give them the value 
Lae’ in order to avoid the degeneracy of the statistic 

) gn (D5). Having obtained the Dw, ks we then compute 
the values of the statistics listed in Table 5.1. The 
values obtained are then normalized using the asymptotic 


mean and variance values and the relation 


moe Ceres Clie) 
¥Y(nt+1)V 


given by Pyke. In Tables 5.3 and 5.4 we give the nor- 
malized sample values obtained for samples of size 20 
and 100 respectively using different values of «. An 
examination of the results show that the statistics 
involving the third and fourth powers exhibited much 
variation in the values obtained for the different e's. 
This indicates how sensitive those two statistics are 


to the different values of «. For the statistic yD 


2 the normalized values obtained using 


ends) (Dy = 1) 
different «'s were identical although the actual values 

of the statistics were different. This apparent discre- 
pancy is explained when the normalized values are calcu- 
lated using the different values of u for the two 


statistics. The remaining two statistics did not show 


much variation in the values obtained using different 
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values of « - indicating their apparent less sensitivity 
to the different «'s. However, compared to the results 
obtained for the statistics using Pyke's data, it is 
observed that on the whole the new results for each 
statistic varied much with the different values of « 
which were, in the case of n= 100, ten times less than those 
used for Pyke's data. This is certainly due to the fact 
that the number of ties occurring in Pyke's data are 
much less than what was observed with the poisson dis- 
tribution data. It thus seems that the sensitivity of 
the statistics to the different «'s depends to a great 
extent on the number of tied observations occurring in 


the sample data. 
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Normalized Sample Values of the Statistics 
using Poisson Distribution 


n = 20 


Statistic U5 
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Normalized Sample Values of the Statistics 
using Poisson Distribution 


n = 100 


Statistic 0.01 0.02 0.03 
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CHAPTER VI 


CONCLUSIONS AND EXTENSIONS 


In the course of the preceding chapters we have 
reviewed the major work done in the area of tests of 
fit based on sample spacings, We have examined rather 
carefully the problem of tied observations and grouping 
errors arising in the context of both continuous and 
discrete distributions. We have also evaluated the 
sensitivity of the test statistics given in Table 2.1 
under distributions other than uniform (0,1) for which 
asymptotic results have been obtained. Monte Carlo 
techniques were used to attack the problem, and the 
results obtained are contained in Chapters IV and V 
for continuous and discrete distributions respectively. 

For continuous distributions, the x5 and standard 
normal distributions were used in addition to the uniform 
(0,1) distribution because of their widespread use in 
statistical studies. The Greenwood test statistics 
proved quite sensitive ff the upper tail, under uniform 
fn); distribution, to the: vartous valves of SJand_< 
used. It was also observed that with increasing sample 
size, the results obtained for the 90, 95 and 99 percent 
points instead of approaching the well known results for 


the standard normal distribution, differed rather widely 
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from them. This sensitivity to the different §'s and 
e'S was not detected under both the x5 and normal (0,1) 
distributions. For both distributions the upper tail 
values were remarkably consistent for each sample size. 
However the frequency distribution of this statistic 
showed a slow trend towards normality under uniform 
(0,1) distribution. But under the x5 distribution the 
frequency distribution was clearly exponential in form 
for all sample sizes used. The frequency distribution 
of the statistic under normal (0,1) distribution was 
positively skewed for all sample sizes. These observa- 
tions lead us to conclude that under some conditions the 
Greenwood statistic may be used for testing hypotheses of 
data which may be assumed to have an underlying uniform 
(0,1) distribution. But under no condition can the 
statistic be used on data assumed to have come from 
normal (0,1) or x5 distributions. This is because the 


sStatistacabisenotszobust« 


The Sherman statistic was extremely sensitive to the 
different values of 6 and « for each sample size under 
uniform (0,1) distribution giving in some cases negative 
results. Like the Greenwood statistic, the results 
obtained for the 90, 95 and 99 percent points became increa- 


singly worse as the sample size increased. However its 
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frequency distribution under uniform (0,1) distribution 
consistently approached that of the normal with esti- 
mates of the skewness and kurtosis statistics being also 
close to the normal (0,1) distribution results. Under 
the x5 distribution the results obtained were quite 
Similar to those for the Greenwood statistic. The 
upper tails exhibited results which showed little or no 
variations with the different values of 6 and e used for 
each sample size. The frequency distribution of the 
statistic under x5 distribution was also slightly posi- 
tively skewed. Again under the standard normal distri- 
bution, the different 6's and «'s did not produce any 
appreciable change in the upper tail values of the sta- 
tistic for each sample size tried. Also the frequency 
distribution under standard normal distribution showed 
an increasing trend towards normality as the sample size 
increased although it was slightly positively skewed. 
From the above observations we are thus led to the con- 
clusion that the Sherman statistic was more sensitive 
under the uniform (0,1) distribution than under the x5 
and normal (0,1) distributions. But its frequency dis- 
tribution under uniform (0,1) distribution was more 
symmetrical than under x3 and normal (0,1) distributions. 
However the statistic was less sensitive under both the 


normal (0,1) and x5 distributions. 


noltudiwseib ({£,0) oxdtinv + 
~bjas dtiw Ismion ott to (ds ta ~ ia 9 
oels pnisd acisaitste tiie: — nk " 
tebaU .edfueer noisuditsstb (6,0) fi 
etiup sisw benistdo etivesw th 
eit  .oiteitsza hoowiitests: ed | 
on ee ene cee cont te : te _ we 
Yo? Beaw s bas 2 to ‘asia tie - 
ent 20 notstuditterb dail ee fie sil a 


i a 143g 
~teog yitdpife o=té asw nottuditset® aise 


~husaiS Isaion bushaste sii xebntr chp Sat 


Yas souborg gon bib a'> Bas 2/6! saszs Sida. 


~s3a eft 20 seuisv List xoaqqy ‘ors, vr ay é 2 ie 
Yousupet3 eds o8fA .beixs ssta” ye . 
bowode notsudizsei® Lenron shel Webs nots 
esis signse eft es yok temson, abxgwos 5 nei et ‘ 
-+bewska ylevisieog yltdpite cn 4 gy 
~ae> ols oF Bel audt o1s ew enoiss 5! 
ovisizase erom enw o2itetiste ar 


ex oni sobaw mds mobsudiziaih (£0 ») a oo . 


>) ie 


81 


The Darling }) gn D, statistic, like the Sherman 
statistic, was also very sensitive to the different §'s 
and «'s used for each sample size under uniform (0,1) 
distribution. But for the other two distributions there 
was not much change in the upper tail values obtained 
with the different 6's and «'s tried for each sample size. 
The distribution of the frequencies under both uniform 
(0,1) and standard normal distributions were slightly 
negatively skewed. But under the x3 distribution, the 
frequency distribution of the statistic displayed a 
remarkable tendency towards normality with estimates 
for skewness and kurtosis being very close to the normal 
distribution value of zero. This statistic was more 
robust under the uniform (0,1), x3 and normal (0,1) dis- 
tributions than the others. 

In obtaining the upper tail percentage points for 
the Darling } 1/D, statistic, the estimated mean and 
variance for each sample size under uniform (0,1) dis- 
tribution were used to normalize the Monte Carlo esti- 
mates of the 90, 95 and 99 percent points. The results 
indicated a slight. sensitivity to the different 6's and 
e's used, with values differing widely from the well 
known results of the above percentage points under normal 
(0,1) distribution. However under the same uniform (0,1) 


distribution but with 6 - the grouping constant - held 
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constant the results for each percentage point were 

the same for the different «'s used to break the ties. 
Those results also differed drastically from the theo- 
retical estimates obtained from the exponential distri- 
bution. The exponential distribution was used because 
the frequency distribution of the statistic using diffe- 
rent «'S was consistently exponential in form. But 
under the same uniform (0,1) distribution with « and 6 
varying, the distribution of the frequency for each 
sample size closely approximated that of the normal 
distribution. This observation was confirmed by the 
estimated values of the skewness and kurtosis statistics 
However under the x5 and normal (0,1) distributions the 
results obtained were completely different from that of 
the other three statistics under the same distributions. 
The Darling ) 1/D, statistic varied appreciably with the 
different values of 6 and « for each sample size under 
the distributions of the x5 and normal (0,1). Also the 
frequency distribution under either distribution showed 
an approach towards normality - a result borne out by 
the estimates of the skewness and kurtosis statistics 
under each distribution. For the different 6's and e's 
then the frequency distribution of the statistic proved 
quite insensitive to the different underlying distribu- 


tions, thus satisfying the condition for robustness. 
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For discrete distributions the situation seems 
to be different. In this case the sensitivity of the 
statistics listed in Table 5.1 to ties and grouping 
errors showed great dependence on the number of tied 
observations obtained in each sample size. This is 
confirmed by the results obtained for the normalized 
sample values of both Pyke's data and the data from 
the poisson distribution. 

In the preceding paragraphs we have looked mainly 
at the asymptotic behaviour of the given statistics in 
the context of tied observations and grouping errors. 

We have also considered how sensitive the test statis- 
tics of Table 2.1 are to underlying distributions other 
than the uniform (0,1) distribution. We propose as an 
extension to the present research a Monte Carlo study 

of the power of each of the test statistics in Table 2.1 
for the detection of discrepancy in the tails of the dis- 
tribution. The results obtained can be compared to what 
may be obtained for such well known test statistics as 
Anderson-Darling, Cramer-von Mises and Kolmogorov-Smirnov 
statistics for evaluation purposes. A similar approach 
has been carried out by Lurie and Hartley (1971) for 


their proposed test statistic. 
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